Aligning words using matrix factorisation

نویسندگان

  • Cyril Goutte
  • Kenji Yamada
  • Éric Gaussier
چکیده

Aligning words from sentences which are mutual translations is an important problem in different settings, such as bilingual terminology extraction, Machine Translation, or projection of linguistic features. Here, we view word alignment as matrix factorisation. In order to produce proper alignments, we show that factors must satisfy a number of constraints such as orthogonality. We then propose an algorithm for orthogonal non-negative matrix factorisation, based on a probabilistic model of the alignment data, and apply it to word alignment. This is illustrated on a French-English alignment task from the Hansard.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation

Micro-blogging services, such as Twitter, offer opportunities to analyse user behaviour. Discovering and distinguishing behavioural patterns in micro-blogging services is valuable. However, it is difficult and challenging to distinguish users, and to track the temporal development of collective attention within distinct user groups in Twitter. In this paper, we formulate this problem as trackin...

متن کامل

Automatically learning the units of speech by non-negative matrix factorisation

We present an unsupervised technique to discover the (wordsized) speech units in which a corpus of utterances can be decomposed. First, a fixed-length high-dimensional vector representation of the utterances is obtained. Then, the resulting matrix is decomposed in terms of additive units by applying the non-negative matrix factorisation algorithm. On a small vocabulary task, the obtained basis ...

متن کامل

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

Nonnegative matrix factorisation and tri-factorisation Nonnegative matrix factorisation (NMF) and tri-factorisation (NMTF) methods decompose a given matrix R into two or three smaller matrices so that R ≈ UV T or R ≈ FSG , respectively. Schmidt, Winther and Hansen (2009) introduced a Bayesian version of nonnegative matrix factorisation (left), which we extend to matrix tri-factorisation (right)...

متن کامل

Probabilistic non-negative matrix factorisation and extensions

Matrix factorisation models have had an explosive growth in popularity in the last decade. It has become popular due to its usefulness in clustering and missing values prediction. We review the main literature for matrix factorisation, focusing on nonnegative matrix factorisation and probabilistic approaches. We also consider several extensions: matrix tri-factorisation, Tensor factorisation, T...

متن کامل

Bayesian Hybrid Matrix Factorisation for Data Integration

1 Models 2 1.1 Matrix factorisation models . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Matrix factorisation with ARD and importance values . . . . . . . . . . . . . 8 1.3 Hybrid matrix factorisation model . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Gibbs sampler . . . . . . . . . ....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004